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Abstract. Consistent query answering is the problem of computing the 
answers from a database that are consistent with respect to certain 
integrity constraints that the database as a whole may fail to satisfy. 
Those answers are characterized as those that are invariant under min¬ 
imal forms of restoring the consistency of the database. In this context, 
we study the problem of repairing databases by fixing integer numeri¬ 
cal values at the attribute level with respect to denial and aggregation 
constraints. We introduce a quantitative definition of database fix, and 
investigate the complexity of several decision and optimization prob¬ 
lems, including DFP, i.e. the existence of fixes within a given distance 
from the original instance, and CQA, i.e. deciding consistency of answers 
to aggregate conjunctive queries under different semantics. We provide 
sharp complexity bounds, identify relevant tractable cases; and introduce 
approximation algorithms for some of those that are intractable. More 
specifically, we obtain results like undecidability of existence of fixes for 
aggregation constraints; MAXSNP-haxdness of DFP, but a good approx¬ 
imation algorithm for a relevant special case; and intractability but good 
approximation for CQA for aggregate queries for one database atom de¬ 
nials (plus built-ins). 


1 Introduction 

Integrity constraints (ICs) are used to impose semantics on a database with the 
purpose of making the database an accurate model of an application domain. 
Database management systems or application programs enforce the satisfaction 
of the ICs by rejecting undesirable updates or executing additional compensat¬ 
ing actions. However, there are many situations where we need to interact with 
databases that are inconsistent in the sense that they do not satisfy certain 
desirable ICs. In this context, an important problem in database research con¬ 
sists in characterizing and retrieving consistent data from inconsistent databases 
[4], in particular consistent answers to queries. From the logical point of view, 

* Dedicated to the memory of Alberto Mendelzon. Our research on this topic started 
with conversations between Loreto Bravo and him. Alberto was always generous with 
his time, advice and ideas; our community is already missing him very much. 

** Also: University of Manchester, Department of Computer Science, UK. 



consistently answering a query posed to an inconsistent database amounts to 
evaluating the truth of a formula against a particular class of first-order struc¬ 
tures [2], as opposed to the usual process of truth evaluation in a single structure 
(the relational database). 

Certain database applications, like census, demographic, financial, and ex¬ 
perimental data, contain quantitative data, usually associated to nominal or 
qualitative data, e.g. number of children associated to a household identification 
code (or address); or measurements associated to a sample identification code. 
Usually this kind of data contains errors or mistakes with respect to certain se¬ 
mantic constraints. For example, a census form for a particular household may 
be considered incorrect if the number of children exceeds 20; or if the age of a 
parent is less than 10. These restrictions can be expressed with denial integrity 
constraints, that prevent some attributes from taking certain values [10]. Other 
restrictions may be expressed with aggregation ICs, e.g. the maximum concen¬ 
tration of certain toxin in a sample may not exceed a certain specified amount; or 
the number of married men and married women must be the same. Inconsisten¬ 
cies in numerical data can be resolved by changing individual attribute values, 
while keeping values in the keys, e.g. without changing the household code, the 
number of children is decreased considering the admissible values. 

We consider the problem of fixing integer numerical data wrt certain con¬ 
straints while (a) keeping the values for the attributes in the keys of the relations, 
and (b) minimizing the quantitative global distance between the original and 
modified instances. Since the problem may admit several global solutions, each 
of them involving possibly many individual changes, we are interested in char¬ 
acterizing and computing data and properties that remain invariant under any 
of these fixing processes. We concentrate on denial and aggregation constraints; 
and conjunctive queries, with or without aggregation. 

Database repairs have been studied in the context of consistent query an¬ 
swering (CQA), i.e. the process of obtaining the answers to a query that are 
consistent wrt a given set of ICs [2] (c.f. [4] for a survey). There, consistent data 
is characterized as invariant under all minimal forms of restoring consistency, i.e. 
as data that is present in all minimally repaired versions of the original instance 
(the repairs). Thus, an answer to a query is consistent if it can be obtained as 
a standard answer to the query from every possible repair. In most of the re¬ 
search on CQA, a repair is a new instance that satisfies the given ICs, but differs 
from the original instance by a minimal set, under set inclusion, of (completely) 
deleted or inserted tuples. Changing the value of a particular attribute can be 
modelled as a deletion followed by an insertion, but this may not correspond 
to a minimal repair. However, in certain applications it may make more sense 
to correct (update) numerical values only in certain attributes. This requires a 
new definition of repair that considers: (a) the quantitative nature of individual 
changes, (b) the association of the numerical values to other key values; and (c) 
a quantitative distance between database instances. 



Example 1. Consider a network traffic database D that stores flow measurements 
of links in a network. This network has two types of links, labelled 0 and 1, with 
maximum capacities 1000 and 1500, resp. 
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Database D is inconsistent wrt this IC. Under the tuple and set oriented seman¬ 
tics of repairs [2], there is a unique repair, namely deleting tuple Traffic{l.l, a, 0, 
1100). However, we have two options that may make more sense than deleting 
the flow measurement, namely updating the violating tuple to Trajjic{l.l, a,0, 
1000) or to 7roj9ic(l.l,a, 1,1100); satisfying an implicit requirement that the 
numbers should not change too much. □ 

Update-based repairs for restoring consistency are studied in [24]; where chang¬ 
ing values in attributes in a tuple is made a primitive repair action; and semantic 
and computational problems around CQA are analyzed from this perspective. 
However, peculiarities of changing numerical attributes are not considered, and 
more importantly, the distance between databases instances used in [24, 25] 
is based on set-theoretic homomorphisms, but not quantitative, as in this pa¬ 
per. In [24] the repaired instances are called fixes, a term that we keep here 
(instead of repairs), because our basic repair actions are also changes of (nu¬ 
merical) attribute values. In this paper we consider fixable attributes that take 
integer values and the quadratic, Euclidean distance L 2 between database in¬ 
stances. Specific fixes and approximations may be different under other distance 
functions, e.g. the “city distance” Li (the sum of absolute differences), but the 
general (in)tractability and approximation results remain. However, moving to 
the case of real numbers will certainly bring new issues that require different 
approaches; they are left for ongoing and future research. Actually it would be 
natural to investigate them in the richer context of constraint databases [17]. 

The problem of attribute-based correction of census data forms is addressed 
in [10] using disjunctive logic programs with stable model semantics. Several 
underlying and implicit assumptions that are necessary for that approach to 
work are made explicit and used here, extending the semantic framework of [10]. 

We provide semantic foundations for fixes that are based on changes on 
numerical attributes in the presence of key dependencies and wrt denial and 
aggregate ICs, while keeping the numerical distance to the original database to a 
minimum. This framework introduces new challenging decision and optimization 
problems, and many algorithmic and complexity theoretic issues. We concentrate 
in particular on the “Database Fix Problem” (DFP), of determining the existence 
of a fix at a distance not bigger than a given bound, in particular considering the 
problems of construction and verification of such a fix. These problems are highly 
relevant for large inconsistent databases. For example, solving DFP can help us 
find the minimum distance from a fix to the original instance; information that 
can be used to prune impossible branches in the process of materialization of a 
fix. The CQA problem of deciding the consistency of query answers is studied wrt 
decidability, complexity, and approximation under several alternative semantics. 







We prove that DFP and CQA become undecidable in the presence of aggre¬ 
gation constraints. However, DFP is iVP-complete for linear denials, which are 
enough to capture census like applications. CQA belongs to FI^ and becomes 
Z\|’-hard, but for a relevant class of denials we get tractability of CQA to non ag¬ 
gregate queries, which is again lost with aggregate queries. Wrt approximation, 
we prove that DFP is MAXSNP-hard in general, and for a relevant subclass of 
denials we provide an approximation within a constant factor that depends on 
the number of atoms in them. All the algorithmic and complexity results, unless 
otherwise stated, refer to data complexity [1], i.e. to the size of the database 
that here includes a binary representation for numbers. For complexity theoretic 
definitions and classical results we refer to [20]. 

This paper is structured as follows. Section 2 introduces basic definitions. 
Sections 3 presents the notion of database fix, several notions of consistent answer 
to a query; and some relevant decision problems. Section 4 investigates their 
complexity. In Section 5 approximations for the problem of finding the minimum 
distance to a fix are studied, obtaining negative results for the general case, but 
good approximation for the class of local denial constraints. Section 6 investigates 
tractability of CQA for conjunctive queries and denial constraints containing 
one database atom plus built-ins. Section 7 presents some conclusions and refers 
to related work. Proofs and other auxiliary, technical results can be found in 
Appendix A.l. 

2 Preliminaries 

Consider a relational schema X = {U,TZ,B,A), with domain U that includes Z,^ 
TZ a set of database predicates, B a set of built-in predicates, and A a set of 
attributes. A database instance is a finite collection D of database tuples, i.e. of 
ground atoms P{c), with P G TZ and c a tuple of constants in U. There is a set 
T Q A oi all the fixable attributes, those that take values in Z and are allowed 
to be fixed. Attributes outside F are called rigid. F need not contain all the 
numerical attributes, that is we may also have rigid numerical attributes. 

We also have a set K, of key constraints expressing that relations R G TZ have 
a primary key Kp>, Ar C (A \ F). Later on (c.f. Definition 2), we will assume 
that 1C is satisfied both by the initial instance D, denoted D |= JC, and its fixes. 
Since FC\Kr = 0, values in key attributes cannot be changed in a fixing process; 
so the constraints in /C are hard. In addition, there may be a separate set IC oi 
flexible ICs that may be violated, and it is the job of a fix to restore consistency 
wrt them (while still satisfying IC). 

A linear denial constraint [17] has the form Vx^(Ai A ... A Am), where the 
Ai are database atoms (i.e. with predicate in TZ), or built-in atoms of the form 
x9c, where x is a variable, c is a constant and 9 G {=, <, >, <, >}, or x = y. 

If X ^ y is allowed, we call them extended linear denials. 

Example 2. The following are linear denials (we replace A by a comma): (a) 
No customer is younger than 21: \/Id, Age, Income, Status^{Customer{Id, Age, 

^ With simple denial constraints, numbers can be restricted to, e.g. N or {0, 1}. 



Income, Status), Age < 21). (b) No customer with income less than 60000 has 
“silver” status: ^Id, Age, Income, Status^{Customer{Id, Age, Income, Status), 
Income < 60000, Status = silver), (c) The constraints in Example 1, e.g. 

\/T,L, Type, Flow^{Traffic{T, L, Type, Flow), Type = 0, Flow > 1000). □ 

We consider aggregation constraints (ACs) [22] and aggregate queries with sum, 
count, average. Filtering ACs impose conditions on the tuples over which ag¬ 
gregation is applied, e.g. sum{Ai : A 2 = 3) > 5 is a sum over Ai of tuples 
with A 2 = 3. Multi-attribute ACs allow arithmetical combinations of attributes 
as arguments for sum, e.g. sum{Ai A 2 ) > 5 and sum{Ai x A 2 ) > 100. If 
an AC has attributes from more than one relation, it is multi-relation, e.g. 
SMmi^j(Ai) = sumR.^(Ai), otherwise it is single-relation. 

An aggregate conjunctive query has the form q(xi,.. .x^', o,gg{z)) ^ B(xi, 

..., Xjn, z,yi,..., yn), where agg is an aggregation function and its non-aggregate 
matrix (NAM) given by q'{xi,.. .Xm) <— B{xi,... ,Xm, z, yi,..., ?/„) is a usual 
first-order (FO) conjunctive query with built-in atoms, such that the aggregation 
attribute z does not appear among the Xi. Here we use the set semantics. An 
aggregate conjunctive query is cyclic (acyclic) if its NAM is cyclic (acyclic) [Ij. 

Example 3. q(x,y, sum(z)) ^ R(x,y),Q(y, z,w), re ^ 3 is an aggregate con¬ 
junctive query, with aggregation attribute z. Each answer {x,y) to its NAM, i.e. 
to q(x, y) <— R(x, y), Q(y, z, w), w ^ 3,is expanded to (x, y, sum(z)) as an answer 
to the aggregate query, sum(z) is the sum of all the values for z having a w, such 
that (x,y,z,w) makes R(x,y),Q(y, z,w),w 7 ^ 3 true. In the database instance 
D = {Ril,2), i?(2,3), g(2,5,9),Q(2,6,7), g(3, 1 , 1 ),Q(3, 1 ,5), g(3,8,3)} the 
answer set for the aggregate query is {(1, 2, 5 -I- 6 ), (2,3,1 -I- 1)}. □ 

An aggregate comparison query is a sentence of the form q(agg(z)), agg(z)9k, 
where q(agg(z)) is the head of a scalar aggregate conjunctive query (with no free 
variables), 0 is a comparison operator, and k is an integer number. For example, 
the following is an aggregate comparison query asking whether the aggregated 
value obtained via q(sum(z)) is bigger than 5: Q: q(sum(z)), sum(z) > 5, with 
q(sum(z)) ^ R(x, y),Q(y, z, w), 

3 Least Squares Fixes 

When we update numerical values to restore consistency, it is desirable to make 
the smallest overall variation of the original values, while considering the relative 
relevance or specific scale of each of the fixable attributes. Since the original 
instance and a fix will share the same key values (c.f. Definition 2), we can use 
them to compute variations in the numerical values. For a tuple k of values for 
the key Kn of relation R in an instance D, t(k,R,D) denotes the unique tuple 
i in relation R in instance D whose key value is k. To each attribute A G F a 
fixed numerical weight oq is assigned. 

Definition 1. For instances D and D' over schema B with the same set val{K]i) 
of tuples of key values for each relation R G TZ, their square distance is 
Aa{D,D') =Y,_R^n,A^j^ a^(TT^(t(k,R,D)) - T:^(t(k,R,D')))'^, 

kGval{Kii) 

where is the projection on attribute A and a = (a^)^gjF. □ 



Definition 2. For an instance D, a set of fixable attributes T, a set of key 
dependencies K., such that D |= /C, and a set of flexible ICs IC: A fix for D wrt 
IC is an instance D' such that: (a) D' has the same schema and domain as D; 
(b) D' has the same values as D in the attributes in A \ iF; (c) D' \= JC] and 
(d) D' \= IC. A least squares fix (LS-fix) for D is a fix D' that minimizes the 
square distance Aa(_D, D') over all the instances that satisfy (a) - (d). □ 


In general we are interested in LS-fixes, but (non-necessarily minimal) fixes will 
be useful auxiliary instances. 

Example 4- (example 1 cont.) TZ = {Traffic}, A = {Time, Link, Type, Flow}, 
KTraffic = {Time, Link}, T = {Type, Flow}, with weights a = (10“®,1), 
resp. For original instance D, val{KTraffic) = {(1-1, a), (1.1, b), (1.3, b)}, t((l.l, a). 
Traffic, D) = (1.1, a, 0,1100), etc. Fixes are Di = {(1.1, a, 0,1000), (1.1, b, 1, 900), 
(1.3,5,1,850)} and Da = {(1.1, a, 1,1100), (1.1,5,1,900), (1.3,5,1,850)}, with 
distances Aa{D,Di) = 100 ^ x 10 “® = 10 “^ and Z\a(D, Da) = 1 ^ x 1 , resp. 
Therefore, Di is the only LS-fix. □ 


The coefficients oq can be chosen in many different ways depending on factors 
like relative relevance of attributes, actual distribution of data, measurement 
scales, etc. In the rest of this paper we will assume, for simplification, that 
= 1 for all A G .F and AffiD, D') will be simply denoted by A{D, D'). 


Example 5. The database D has relations Client{ID, A,M), with key ID, at¬ 
tributes A for age and M for amount of money; and Buy (ID, I, P), with key 
{ID, I}, I for items, and P for prices. We have denials ICi : \/ID,P,A,Mffi 
Buy{ID,I,P),Client{ID,A,M),A < 18, P > 25) and IC 2 : V/D,A,M^( 
Client{ ID, A, M), A < 18, M > 50), requiring that people younger than 18 can- 
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not spend more than 25 on one item 
nor spend more than 50 in the store. 
We added an extra column in the ta¬ 
bles with a label for each tuple. ICi is 
violated by {DiD} and {^ 1 ,^ 5 }; and IC 2 
by {D} and {< 2 }- We have two LS-fixes 
(the modified version of tuple D is t}, 
etc.), with distances A{D,D') = 
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22 - 1 - 12 - 1 - 22 - 1 - 12 _ A{D, D") = 3^ + p = 10. We can see that a global 

fix may not be the result of applying “local” minimal fixes to tuples. □ 














































The built-in atoms in linear denials determine a solution space for fixes as an 
intersection of semi-spaces, and LS-fixes can be found at its “borders” (c.f. pre¬ 
vious example and Proposition A.l in Appendix A.l). It is easy to construct 
examples with an exponential number of fixes. For the kind of hxes and ICs we 
are considering, it is possible that no fix exists, in contrast to [2, 3], where, if the 
set of ICs is consistent as a set of logical sentences, a fix for a database always 
exist. 

Example 6. R{X,Y) has key X and fixable Y. ICi = {yXiX 2 Y^{R{Xi,Y), 
R{X2, r),Ai = 1,^2 = 2), VAiA2r-(i?(Ai, y), R{X2,Y),Xi = 1,X2= 3), 
VAiA2r-( R{Xi,Y), R{X2, Y),Xi = 2,^2 = 3),yXY^{R{X, Y), Y> 3), VAr-( 
R{X,Y),Y < 2)} is consistent. The first three ICs force Y to be different in 
every tuple. The last two ICs require 2 < Y <3. The inconsistent database 
R = {(I,—I), (2,1), (3,5)} has no fix. Now, for IC 2 with \/X,Y^{R{X,Y), 
y > 1) and sum(Y) = 10, any database with less than 10 tuples has no fixes. □ 

Proposition 1. If D has a fix wrt IC, then it also has an LS-fix wrt IC. □ 

4 Decidability and Complexity 

In applications where fixes are based on changes of numerical values, computing 
concrete fixes is a relevant problem. In databases containing census forms, cor¬ 
recting the latter before doing statistical processing is a common problem [10]. 
In databases with experimental samples, we can fix certain erroneous quantities 
as specified by linear ICs. In these cases, the fixes are relevant objects to com¬ 
pute explicitly, which contrasts with CQA [2], where the main motivation for 
introducing repairs is to formally characterize the notion of a consistent answer 
to a query as an answer that remains under all possible repairs. In consequence, 
we now consider some decision problems related to existence and verification of 
LS-fixes, and to CQA under different semantics. 

Definition 3. For an instance D and a set IC oi ICs: 

(a) Fix{D, IC) := {O' \ D' is an LS-fix of D wrt IC}, the fix checking problem. 

(b) Fix{IC) := {{D,D') \ D' e Fix{D,lC)}. 

(c) NE{IC) := {D I Fix{D, IC) 0}, for non-empty set of fixes, i.e. the problem 
of checking existence of LS-fixes. 

(d) NE := {{D,IC) I Fix{D,lC) + 0}. 

(e) DFP{IC) := {(D, A:)| there is D' e Fix{D, IC) vfiih. A{D, D') < k}, the 
database fix problem, i.e. the problem of checking existence of LS-fixes within 
a given positive distance k. 

(f) DFOP{lC) is the optimization problem of finding the minimum distance from 

an LS-fix wrt IC to a given input instance. □ 

Definition 4. Let D be a database, IC a set ICs, and Q a conjunctive query^. 
(a) A ground tuple f is a consistent answer to Q{x) under the: (al) skeptieal 
semanties if for every D' G Fix{D, IC), D' ^ Q{i). (a2) brave semanties if there 

^ Whenever we say just “conjunctive query” we understand it is a non aggregate query. 



exists D' G Fix{D,lC) with D' ^ Q{t). (a3) majority semantics if \{D' \ D' G 
Fix{D,lC) and D' |= Q(t)}\ > \{D' \ D' G Fix{D,IC) and D' ^ Q(t)}|. 

(b) That t is a consistent answer to Q in I? under semantics S is denoted by D |=5 
(5[t]. If Q is ground and D |=5 Q, we say that yes is a consistent answer, meaning 
that Q is true in the fixes of D according to semantics S. CA{Q, D, IC, S) is the 
set of consistent answers to Q in D wrt IC under semantics S. For ground Q, if 
CA{Q,D,IC,S)^{yes}, CA{Q,DJC,S) := {no}. 

(c) CQA{Q, IC,S) := {{D,F) \ t G CA{Q, D, IC,S)} is the decision problem of 

consistent query answering, of checking consistent answers. □ 

Proposition 2. NE{IC) can be reduced in polynomial time to the complements 
of CQA(False, IC, Skeptical) and CQA{True, IC, Majority), where False, True 
are ground queries that are always false, resp. true. □ 

In Proposition 2, it suffices for queries False, True to be false, resp. true, in all 
instances that share the key values with the input database. Then, they can 
be represented by 3YR(c,Y), where c are not (for False), or are (for True) key 
values in the original instance. 

Theorem 1. Under extended linear denials and complex, filtering, multi-attri¬ 
bute, single-relation, aggregation constraints, the problems NE of existence of 
LS-fixes, and CQA under skeptical or majority semantics are undecidable. □ 

The result about NE can be proved by reduction from the undecidable Hilbert’s 
problem on solvability of diophantine equations. For CQA, apply Proposition 
2. Here we have the original database and the set of ICs as input parameters. 
In the following we will be interested in data complexity, when only the input 
database varies and the set of ICs is fixed [1]. 

Theorem 2. For a fixed set IC of linear denials: (a) Deciding if for an instance 
D there is an instance D' (with the same key values as D) that satisfies IC with 
A{D,D') < k, with positive integer k that is part of the input, is in NP. (b) 
DFP{IC) is AP-complete. (c.f. Definition 3(e)) □ 

By Proposition 1, there is a fix for D wrt /Cat a distance < fc iff there is an LS-fix 
at a distance < k. Part (b) of Theorem 2 follows from part (a) and a reduction 
of Vertex Cover to DPP{ICo), for a fixed set of denials /Cq. By Theorem 2(a), 
if there is a fix at a distance < k, the minimum distance to D for a fix can be 
found by binary search in log(k) steps. Actually, if an LS-fix exists, its square 
distance to D is polynomially bounded by the size of D (c.f. proof of Theorem 
3). Since D and a fix have the same number of tuples, only the size of their 
values in a fix matter, and they are constrained by a fixed set of linear denials 
and the condition of minimality. 

Theorem 3. For a fixed set IC of extended linear denials: (a) The problem 
NE{IC) of deciding if an instance has an LS-fix wrt IC is AP-compiete, and (b) 
CQA under the skeptical and the majority semantics is coAP-hard. □ 



For hardness in (a), (b) in Theorem 3, linear denials are good enough. Member¬ 
ship in (a) can be obtained for any fixed finite set of extended denials. Part (b) 
follows from part (a). The latter uses a reduction from 3-Colorability. 

Theorem 4. For a fixed set IC of extended linear denials: (a) The problem 
Fix{lC) of checking if an instance is an LS-fix is coNP-complete, and (b) CQA 
under skeptical semantics is in , and, for ground atomic queries, Zl^^-hard. □ 

Part (a) uses 3SAT. Hardness in (b) is obtained by reduction from a Zll’-complete 
decision version of the problem of searching for the lexicographically Maximum 
3-Satisfying Assignment {MSSA): Decide if the last variable takes value 1 in it 
[16, Theo. 3.4]. Linear denials suffice. Now, by reduction from the Vertex Cover 
Problem, we obtain 

Theorem 5. For aggregate comparison queries using sum, CQA under linear 
denials and brave semantics is coTVP-hard. □ 


5 Approximation for the Database Fix Problem 

We consider the problem of finding a good approximation for the general opti¬ 
mization problem DPOP{lC). 

Proposition 3. For a fixed set of linear denials IC, DFOP{lC) is MAXSNP- 
hard. □ 

This result is obtained by establishing an L-reduction to DFOP{lC) from the 
MAXSNP-complete [21, 20] B-Minimum Vertex Cover Problem, i.e. the vertex 
cover minimization problem for graphs of bounded degree [15, Chapter 10]. As 
an immediate consequence, we obtain that DFOP{lC) cannot be uniformly ap¬ 
proximated within arbitrarily small constant factors [20]. 

Corollary 1. Unless P = NP, there is no Polynomial Time Approximation 
Schema for DFOP. □ 

This negative result does not preclude the possibility of finding an efficient al¬ 
gorithm for approximation within a constant factor for DFOP. Actually, in the 
following we do this for a restricted but still useful class of denial constraints. 


5.1 Local denials 

Definition 5. A set of linear denials IC is local if: (a) Attributes participating 
in equality atoms between attributes or in joins are all rigid; (b) There is a 
built-in atom with a fixable attribute in each element of IC] (c) No attribute A 
appears in IC both in comparisons of the form A < ci and A > C 2 .^ □ 

® To check condition (c), x < c, x > c, a; A c have to be expressed using <, >, e.g. 
x<chyx<c+l. 



In Example 5, IC is local. In Example 6, ICi is not local. Local constraints have 
the property that by doing local fixes, no new inconsistencies are generated, and 
there is always an LS-fix wrt to them (c.f. Proposition A.2 in Appendix A.l). 
Locality is a sufficient, but not necessary condition for existence of LS-fixes 
as can be seen from the database {P(a, 2)}, with the first attribute as a key 
and non-local denials -^{P{x,y),y < 3),^{P{x,y),y > 5), that has the LS-fix 
{P(a,3)}. 

Proposition 4. For the class of local denials, DFP is AF-complete, and DFOP 
is MAXSNP-haxd. □ 

This proposition tells us that the problem of finding good approximations in the 
case of local denials is still relevant. 

Definition 6. A set I of database tuples from D is a violation set for ic G IC 
if / ^ ic, and for every I' ^ I, I' \= ic. I{D, ic,t) denotes the set of violation 
sets for ic that contain tuple t. □ 

A violation set I for ic is a minimal set of tuples that simultaneously participate 
in the violation of ic. 

Definition 7. Given an instance D and ICs IC, a local fix for t G D, is a 
tuple t' with: (a) the same values for the rigid attributes as t; (b) S{t,t') := 
{/ I there is ic G IC, I G F{D,ic,t) and ((/ \ {f}) U {t'}) ^ ic} ^ 0; and 
(c) there is no tuple t" that simultaneously satisfies (a), S{t,t") = S{t,t'), and 
Z\({i}, {t"}) < L\({i}, {f'D, where A denotes quadratic distance. □ 

S{t, t') contains the violation sets that include t and are solved by replacing t' for 
t. A local fix t' solves some of the violations due to t and minimizes the distance 
to t. 


5.2 Database fix problem as a set cover problem 

For a fixed set IC of local denials, we can solve an instance of DFOP by trans¬ 
forming it into an instance of the Minimum Weighted Set Cover Optimization 
Problem {MWSCP). This problem is MAAS'AP-hard [19, 20], and its general ap¬ 
proximation algorithms are within a logarithmic factor [19, 8]. By concentrating 
on local denials, we will be able to generate a version of the MWSCP that can 
be approximated within a constant factor (c.f. Section 5.3). 

Definition 8. For a database D and a set ICoi local denials, G{D, IC) = (T, H) 
denotes the conflict hypergraph for D wrt IC [7], which has in the set T of 
vertices the database tuples, and in the set H of hyperedges, the violation sets 
for elements ic G IC. □ 

Hyperedges in H can be labelled with the corresponding ic, so that different 
hyperedges may contain the same tuples. Now we build an instance oi MWSCP. 



Definition 9. For a database D and a set IC of local denials, the instance 
{U,S,w) for the MWSCP, where U is the underlying set, S is the set collection, 
and w is the weight function, is given by: (a) U := H, the set of hyperedges of 
G{D, IC). (b) S contains the 5'(t, t'), where t' is a local fix for a tuple t € D. (c) 

It can be proved that the in this construction are non empty, and that 

S covers U (c.f. Proposition A.2 in Appendix A.l). 

If for the instance {U,S, w) of MWSCP we find a minimum weight cover C, 
we could think of constructing a fix by replacing each inconsistent tuple t € Dhy 
a local fix P with S{t, t') G C. The problem is that there might be more than one 
t' and the key dependencies would not be respected. Fortunately, this problem 
can be circumvented. 

Definition 10. Let C be a cover for instance {U,S,w) of the MWSCP associ¬ 
ated to D, IC. (a) C* is obtained from C as follows: For each tuple t with local 
fixes n > 1, such that S{t,ti) G C, replace in C all the Sit,ti) by a 

single S{t,t*), where t* is such that S{t,t*) = S{t,ti). (b) D{C) is the 
database instance obtained from D by replacing t by t' if S{t, t') G C*. □ 

It holds (c.f. Proposition A.3 in Appendix A.l) that such an S{t,t*) G S exists 
in part (a) of Definition 10. Notice that there, tuple t could have other S{t,t') 
outside C. Now we can show that the reduction to MWSCP keeps the value of 
the objective function. 

Proposition 5. If C is an optimal cover for instance {U,S,w) of the MWSCP 
associated to D, IC, then D{C) is an LS-fix of D wrt IC, and A{D, D{C)) = 
w{C) = w{C*). □ 

Proposition 6. For every LS-fix D' of D wrt a set of local denials IC, there 
exists an optimal cover C for the associated instance (U,S, w) of the MWSCP, 
such that D' = D{C). □ 

Proposition 7. The transformation of DFOP into MWSCP, and the construc¬ 
tion of database instance D{C) from a cover C for {U,S,w) can be done in 
polynomial time in the size of D. □ 

We have established that the transformation of DFOP into MWSCP is an L- 
reduction [20]. Proposition 7 proves, in particular, that the number of violation 
sets S{t,t') is polynomially bounded by the size of the original database D. 

Example 7. (example 5 continued) We illustrate the reduction from DFOP to 
MWSCP. The violation sets are {ti,t 4 } and {fi,t 5 } for ICi and {ti} and {^ 2 } for 
IC 2 . The figure shows the hypergraph. For the MWSCP instance, we need the 
local fixes. Tuple ti has two local fixes = (1,15,50), that solves the violation 
set {ti} of IC 2 (hyperedge B), and = (1,18, 52), that solves the violation sets 
{ti,t 4 } and of ICi, and {ti} of IC 2 (hyperedges A,B, C), with weights 

4 and 9, resp. t 2 , and ts have one local fix each corresponding to: (2,16, 50), 
(1, CD, 25) and (1, DVD, 25), resp. The consistent tuple ts has no local fix. 



Set 

Si 

S2 

S3 

S4 

S5 

Local Fix 


ti” 

^2’ 

tA 

ts’ 

Weight 

4 

9 

1 

4 

1 

Hyperedge A 

0 

1 

0 

1 

0 

Hyperedge B 

1 

1 

0 

0 

0 

Hyperedge C 

0 

1 

0 

0 

1 

Hyperedge D 

0 

0 

1 

0 

0 



The MWSCP instance is shown in the table, where the elements are rows and 
the sets (e.g. Si = S{ti,t'i)), columns. An entry 1 means that the set contains 
the corresponding element; and a 0 , otherwise. There are two minimal covers, 
both with weight 10 : Ci = {S'2, S'a} and C 2 = {Ai, S'3, 54, S'5}. D{Ci) and D{C 2 ) 
are the two fixes for this problem. □ 

If we apply the transformation to Example 6, that had non-local set of ICs and 
no repairs, we will find that instance D(C), for C a set cover, can be constructed 
as above, but it does not satisfy the flexible ICs, because changing inconsis¬ 
tent tuples by their local fixes solves only the initial inconsistencies, but new 
inconsistencies are introduced. 


5.3 Approximation via set cover optimization 

Now that we have transformed the database fix problem into a weighted set cover 
problem, we can apply approximation algorithms for the latter. We know, for 
example, that using a greedy algorithm, MWSCP can be approximated within a 
factor log{N), where N is the size of the underlying set U [ 8 ]. The approximation 
algorithm returns not only an approximation w to the optimal weight w°, but 
also a -non necessarily optimal- cover C for problem (U,S,w). As in Definition 
10, C can be used to generate via {€)*, a fix D{C) for D that may not be LS- 
minimal. 

Example 8. (examples 5 and 7 continued) We show how to to compute a solution 
to this particular instance of DFOP using the greedy approximation algorithm 
for MWSCP presented in [8]. We start with C := %, S^ := Sp, and we add to C the 
Si such that S')* has the maximum contribution ratio |S'°|/r(;(5'°). The alternatives 
are |S'i|/r(;(S'i) = 1/4, |S' 2 |/'u:(S' 2 ) = 3/9, | 5 ' 3 |/w( 5 ' 3 ) = I, |S' 4 |/w( 5 ' 4 ) = 1/4 and 
I S' 5 1/ic) S' 5 ) = 1. The ratio is maximum for S 3 and S 5 , so we can add any of them 
to C. If we choose the first, we get C = {S 3 }. Now we compute the S} := S/ \ S 3 , 
and choose again an Si for C such that S/ maximizes the contribution ratio. Now 
S 5 is added to C, because S} gives the maximum. By repeating this process until 
we get all the elements of U covered, i.e. all the S/ become empty at some 
iteration point k, we finally obtain C = {S 3 , S 5 , Si, S 4 }. In this case C is an 
optimal cover and therefore, D{C) is exactly an LS-fix, namely D' in Example 5. 
Since this is an approximation algorithm, in other examples the cover obtained 
might not be optimal. □ 




Proposition 8. Given database instance D with local ICs IC, the database 
instance D(C) obtained from the approximate cover C is a fix and it holds 
A{D^D{C)) < log{N) x A{D,D'), where D' is any LS-fix of D wrt IC and 
N is the number of of violation sets for D wrt IC. □ 

In consequence, for any set IC of local denials, we have a polynomial time ap¬ 
proximation algorithm that solves DFOP{lC) within an 0{log{N)) factor, where 
N is the number of violation sets for D wrt IC. As mentioned before, this number 
N, the number of hyperedges in Q, is polynomially bounded by \D\ (c.f. Propo¬ 
sition 7). N may be small if the number of inconsistencies or the number of 
database atoms in the ICs are small, which is likely the case in real applications. 

However, in our case we can get even better approximations via a cover C 
obtained with an approximation algorithms for the special case of the MWSCP 
where the number of occurrences of an element of U in elements of S is bounded 
by a constant. For this case of the MWSCP there are approximations within a 
constant factor based on “linear relaxation” [15, Chapter 3]. This is clearly the 
case in our application, being m x \T\ x \IC\ a constant bound (independent 
from jUj) on the frequency of the elements, where m is the maximum number 
of database atoms in an IC. 

Theorem 6. There is an approximation algorithm that, for a given database 
instance D with local ICs IC, returns a fix D(C) such that A{D,D{C)) < c x 
A{D, D'), where c is a constant and D' is any LS-fix of D. □ 

6 One Atoms Denials and Conjunctive Queries 

In this section we concentrate on the common case of one database atom denials 
(lAD), i.e. of the form where atom A has a predicate in TZ, and B 

is a conjunction of built-in atoms. They capture range constraints; and census 
data is usually stored in single relation schemas [ 10 ]. 

For IADs, we can identify tractable cases for CQA under LS-fixes by reduc¬ 
tion to CQA for (tuple and set-theoretic) repairs of the form introduced in [2] for 
key constraints. This is because each violation set (c.f. Definition 6 ) contains one 
tuple, maybe with several local fixes, but all sharing the same key values; and 
then the problem consists in choosing one from different tuples with the same 
key values (c.f. proof of Theorem 7). The transformation preserves consistent 
answers to both ground and open queries. 

The “classical” -tuple and set oriented- repair problem as introduced in [2] 
has been studied in detail for functional dependencies in [7, 11]. In particular, for 
tractability of CQA in our setting, we can use results and algorithms obtained 
in [ 11 ] for the classical framework. 

The join graph Q{Q) [11] of a conjunctive query Q is a directed graph, whose 
vertices are the database atoms in Q. There is an arc from L to L' L ^ L' 
and there is a variable w that occurs at the position of a non-key attribute in L 
and also occurs in L'. Furthermore, there is a self-loop at L if there is a variable 
that occurs at the position of a non-key attribute in L, and at least twice in L. 



When Q does not have repeated relations symbols, we write Q G Crree if 
G{Q) is a forest and every non-key to key join of Q is full i.e. involves the whole 
key. Classical CQA is tractable for queries in Crree [H]- 

Theorem 7. For a fixed set of IADs and queries in Crree, consistent query 
answering under LS-fixes is in PTIME. □ 

We may define that a aggregate conjunctive query belongs to Crree if its un¬ 
derlying non-aggregate conjunctive query, i.e. its NAM (c.f. Section 2) belongs 
to Crree- Even for IADs, with simple comparison aggregate queries with sum, 
tractability is lost under the brave semantics. 

Proposition 9. For a fixed set of IADs, and for aggregate queries that are in 
Crree Or acyclic, CQA is NF-hard under the brave semantics. □ 

For queries Q returning numerical values, which is common in our framework, 
it is natural to use the range semantics for CQA, introduced in [3] for scalar 
aggregate queries and functional dependencies under classical repairs. Under this 
semantics, a consistent answer is the pair consisting of the min-max and max- 
min answers, i.e. the supremum and the infimum, resp., of the set of answers to 
Q obtained from LS-fixes. The CQA decision problems under range semantics 
consist in determining if a numerical query Q, e.g. an aggregate query, has its 
answer < ki in every fix {min-max case), or > ^2 in every fix {max-min case). 

Theorem 8. For each of the aggregate functions sum, count distinct, and aver¬ 
age, there is a fixed set of IADs and a fixed aggregate acyclic conjunctive query, 
such that CQA under the range semantics is AF-hard. □ 

For the three aggregate functions one lAD suffices. The results for count distinct 
and average are obtained by reduction from MAXSAT [20] and 3SAT, resp. For 
sum, we use a reduction from the Independent Set Problem with bounded degree 
3 [13]. The general Independent Set Problem has bad approximation properties 
[15, Chapter 10]. The Bounded Degree Independent Set has efficient approxima¬ 
tions within a constant factor that depends on the degree [14]. 

Theorem 9. For any set of IADs and conjunctive query with sum over a non¬ 
negative attribute, there is a polynomial time approximation algorithm with a 
constant factor for CQA under min-max range semantics. □ 

The factor in this theorem depends upon the ICs and the query, but not on the 
size of the database. The acyclicity of the query is not required. The algorithm 
is based on a reduction of our problem to satisfying a subsystem with maximum 
weight of a system of weighted algebraic equations over the Galois field with two 
elements GF[2] (a generalization of problems in [12, 23]), for which a polynomial 
time approximation similar to the one for MAXSAT can be given [23]. 

7 Conclusions 

We have shown that fixing numerical values in databases poses many new com¬ 
putational challenges that had not been addressed before in the context of 



consistent query answering. These problems are particularly relevant in census 
like applications, where the problem of data editing is a common and difficult 
task (c.f. http://www.unece.org/stats/documents/2005.05.sde.htm). Also 
our concentration on aggregate queries is particularly relevant for this kind of 
statistical applications. In this paper we have just started to investigate some 
of the many problems that appear in this context, and several extensions are 
in development. We concentrated on integer numerical values, which provide a 
useful and challenging domain. Considering real numbers in fixable attributes 
opens many new issues, requires different approaches; and is a subject of ongoing 
research. 

The framework established in this paper could be applied to qualitative at¬ 
tributes with an implicit linear order given by the application. The result we 
have presented for fixable attributes that are all equally relevant (a^ = 1 in 
Definitions 1 and 2) should carry over without much difficulty to the general 
case of arbitrary weighted fixes. We have developed (but not reported here) ex¬ 
tensions to our approach that consider minimum distribution variation LS-fixes 
that keep the overall statistical properties of the database. We have also devel¬ 
oped optimizations of the approximation algorithm presented in Section 5; and 
its implementation and experiments are ongoing efforts. More research on the 
impact of aggregation constraints on LS-fixes is needed. 

Of course, if instead of the L 2 distance, the Li distance is used, we may 
get for the same database a different set of (now Li) fixes. The actual approx¬ 
imations obtained in this paper change too. However, the general complexity 
and approximability results should remain. They basically depend on the fact 
that distance functions are non-negative, additive wrt attributes and tuples, 
computable in polynomial time, and monotonically increasing. Another possible 
semantics could consider an epsilon of error in the distance in such a way that 
if, for example, the distance of a fix is 5 and the distance to another fix is 5.001, 
we could take both of them as (minimal) LS-fixes. 

Other open problems refer to cases of polynomial complexity for linear denials 
with more that one database atom; approximation algorithms for the DFOP for 
non-local cases; and approximations to CQA for other aggregate queries. 

For related work, we refer to the literature on consistent query answering 
(c.f. [4] for a survey and references). Papers [24] and [10] are the closest to our 
work, because changes in attribute values are basic repair actions, but the pe¬ 
culiarities of numerical values and quantitative distances between databases are 
not investigated. Under the set-theoretic, tuple-based semantics, [7, 6, 11] re¬ 
port on complexity issues for conjunctive queries, functional dependencies and 
foreign key constraints. A majority semantics was studied in [18] for database 
merging. Quite recent papers, but under semantics different than ours, report re¬ 
search on fixing numerical values under aggregation constraints [9]; and heuristic 
construction of repairs based on attribute values changes [5]. 
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A Appendix 

A.l Proofs 

Those auxiliary technical results that are stated in this appendix, but not in the 
main body of the paper, are numbered in the form A.n, e.g. Lemma A.l. 

Proof of Proposition 1: Let p be the square distance between D and D' in 
Definition 1. The circle of radius p around D intersects the non empty “consis¬ 
tent” region that contains the database instances with the same schema and key 
values as D and satisfy IC. Since the circle has a finite number of instances, the 
distance takes a minimum in the consistent region. □ 

The following lemma proves that if a tuple is involved in an inconsistency, 
the set of constraints is consistent and there is at least one fixable attribute in 
each integrity constraint, then there always exists a local fix (see Definition 7) 
for it. 

Lemma A.l. For a database D and a consistent set of linear denial constraints 
IC, where each constraint contains at least one built-in involving a flexible con¬ 
straint and there are equalities or joins only between rigid attributes. Then, 
for every tuple t with at least one fixable attribute and at least one ic in IC, 
X{D,ic,t) 7 ^ 0, there exists at least one local fix t' (see Definition 7) □ 

Proof: Each constraint ic G /C has the form Va;^(Pi(a;),..., Pn{x), Ai < 
Ci, Aj > Cj,Ak = Ck,Ai ^ Cl,...) and can be rewritten as a clause only with <, 
> and =: 

yx{^Pi{x) V... V -^Pn{x) y Ai > Ci\/ Aj < CjV Ak < CkT Ak > CkT Ai = c/ V ...) 

( 1 ) 

This formula shows that since the repairs are done by attributes updates, the 
only way we have of solving an inconsistency is by fixing at least one of the val¬ 
ues of a fixable attribute. Let ic be a constraint in IC such that I{D, ic, t) yf 0 
and / be a violation set / G I{D,ic,t). Now, since ic G IC, ic is a consistent 
constraints. Then for each fixable attribute A in ic we are able to derive an 



interval [ci,c„] such that if the value of A is in it, we would restore the con¬ 
sistency of I. For example if we have a constraint in form of equation (1) with 
A < 5, then, if we want to restore consistency by modifying A we would need 
to have A G (— 00 , 5]. If the constraint had also A > 1 the interval would be 
[1,5]. Since t has at least one fixable attribute and each fixable attribute has an 
interval, it is always possible to adjust the value of that fixable attribute to a 
value in the interval [c;,c„] and restore consistency. By finding the adjustment 
that minimizes the distance from the original tuple we have find a local fix for 
the tuple t. □ 

The borders of an attribute in an extended linear denial correspond to the 
surfaces of the semi-spaces determined by the built-in atoms in it. 

Proposition A.l. Given a database D and a set of linear denials IC, where 
equalities and joins can only exist between rigid attributes, the values in every 
fixable attributes in a local fix t' (c.f. Definition 7) of a tuple t G D will corre¬ 
spond to the original value in t or to a border of a constraint in IC. Furthermore, 
the values in every attributes of a tuple t' G D' will correspond to the original 
value of the attribute in the tuple in D or to a border of a constraint in IC. □ 

Proof: First we will replace in all the constraints X < c hy X < {c + 1), X > c 
hy X > (c — 1) and X = c hy {X > (c — 1) A A < {c + 1)). We can do this 
because we are dealing with integer values. Then, a constraint ic would have the 
form Va;^(Pi(a;),..., Pn{x), Ai < a, Aj > Cj,Ak ^ Ck, ■ ■ •) and can be rewritten 

\/x{^Px{x) V ... V -^Pn{x) V Ai > a V Aj < Cj y Ak = Cky ■ ■ ■) (2) 

This formula shows that since the repairs are done by attributes updates, the 
only way we have of solving an inconsistency is by fixing at least one of the 
values of a fixable attribute. This would imply to change the value of a fixable 
attribute Ai to something equal or greater than Cj, to change the value of a 
fixable attribute Aj to a value equal or smaller than Cj or to change the value 
of attribute Ak to Ck- 

If D is consistent wrt IC then there is a unique LS-fix D' = D and all the 
values are the same as the original ones and therefore the proposition holds. If 
D is inconsistent wrt IC then there exists a tuple t with at least one fixable 
attribute and a set ICt C IC such that for every ic G ICt it holds P{D, ic, t) ^ 0. 
If ICt is an inconsistent set of constraints then there exists no local fix and the 
proposition holds. If ICt is consistent but there is at least one constraint with no 
fixable attributes involved then, since it is not possible to modify any attribute 
in order to satisfy the constraint, there is no local fix and the proposition holds. 

So we are only missing to prove the proposition for ICt consistent and with 
at least one fixable attributes for each ic in ICt- From Lemma A.l we know 
that there exists a local fix for t. Also, since ICt is consistent, using the same 
arguments as in proof of Lemma A.l, it is possible to define for each fixable 
attribute A an interval such that if the value of A is in it we would restore the 
consistency of the violation sets for constraints in ICt involving t. Then, we need 
to prove that if a value of an attribute, say A, of a local fix t' of t is different 



than the one in t, then the value corresponds to one of the closed limits of the 
interval for A. Let us assume that an attribute A is restricted by the constraints 
to an interval [ci, c„] and that the local fix t' takes for attribute A a value strictly 
smaller than c„ and strictly greater than c/. Without lost of generality we will 
assume that the value of attribute ^ in t is bigger than c„. Let t” be a tuple with 
the same values as t' except that the attribute A is set to c^. t" will have the 
same values in the rigid attributes as t and also S{t, t') = S{t, t") since the value 
of A in t" is still in the interval. We also have that A{{t}, {t''}) < Z\({t}, {t'}). 
This implies that t' is not a local fix and we have reached a contradiction. 

For the second part of the proposition, the proof of the first part can be 
easily extended to prove that the values in D' will correspond to a border of a 
constraint in IC, because the LS-fixes are combination of local fixes. □ 

Proof of Theorem 1: Hilbert’s 10th problem on existence of integer solutions 
to diophantine equations can be reduced to our problem. Given a diophantine 
equation, it is possible to construct a database D and a set of ICs IC such that 
the existence of an LS-fix for D wrt IC implies the existence of a solution to the 
equation, and viceversa. An example can be found in Appendix A.2. □ 

Proof of Proposition 2: First for the skeptical semantics. Given a database 
instance D, consider the instance (U, no) for CQA{False, IC, Sk), corresponding 
to the question “Is there an LS-fix of D wrt /Gthat does not satisfy Falsel” has 
answer Yes iff the class of LS-fixes of D is empty. For the majority semantics, for 
the instance {D, no) for CQA{True, IC, Maj), corresponding to the question “Is 
it not the case that the majority of the LS-fixes satisfy True!”, we get answer 
yes iff the set of LS-fixes is empty. □ 

Proof of Theorem 2: (a) First of all, we notice that a linear denial with implicit 
equalities, i.e. occurrences of a same variable in two different database atoms, 
e.g. yx, Y, Z^{R{X, Y),Q{Y, Z), Z > 3), can be replaced by its explicit version 
with explicit equalities, e.g. VX, Y, Z, W^{R{X, Y), Q{W, Z),Y =W,Z > 3). 

Let n be the number of tuples in the database, and I be the number of 
attributes which participate in IC. They are those that appear in built-in predi¬ 
cates in the explicit versions of the ICs that do not belong to a key or are equal 
to a key (because they are not allowed to change). For example, given the denial 
-^{P{X, Y), Q{X, Z), Y > 2), since its explicit version is ^{P{X, Y), Q{W, Z),Y > 
2,X = W), the number I is 1 (for Y) if AT is a key for P or Q, and 3 if AT is not 
a key (for Y,X,W). 

If there exist an LS-fix D' with A{D, D') < k, then no value in a fixable 
attribute in D' differs from its corresponding value (through the key value) in D 
by more than ^/k. In consequence, the size of an LS-fix may not differ from the 
original instance by more than I x n x bin(k)/2, where bin{k) is the size of the 
binary representation of k. Thus, the size of an LS-fix is polynomially bounded 
by the size of D and k. Since we can determine in polynomial time if D' satisfies 
the ICs and if the distance is smaller than k, we obtain the result. 



(b) Membership: According to Proposition 1, there is an LS-fix at a square 
distance < A: iff there is an instance D' with the same key values that satisfies 
IC at a square distance < k. We use Proposition 2. 

Hardness: We can reduce Vertex Cover (VC) to DFP{ICq) for a fixed set of 
denials ICq. Given an instance {V,S), k for VC, consider a database D with a 
relation E{X, Y) and key {AT, V} for the edges of the graph, and a relation for the 
vertices V{X, Chosen), where X is the key and attribute Chosen, the only fix- 
able attribute, is initially set to 0. The constraint IC: W, Y, Ci,C 2 ^{E{X, Y) A 
V{X, Cl) A V {Y, C 2 ) A Cl < 1 A C 2 < 1) expresses that for any edge, at least one 
of the incident vertices is covered. A vertex cover of size k exists iff there exists 
an LS-fix of D wrt IC at a distance < k. The encoding is polynomial in the size 
of the original graph. □ 

Proof of Theorem 3: (a) For hardness, linear denials are good enough. We 
reduce the graph 3-colorability problem to NE{ICq), for a fixed set ICq of ICs. 
Let Q = {V,£) be an undirected graph with set of vertexes V and set of edges 
£. Consider the following database schema, instance D, and set ICq of ICs: 

1. Relation Vertex{Id, Red, Creen, Blue) with key Id and domain N for the 
last three attributes, actually the only three fixable attributes in the database; 
they can be subject to changes. For each u e V we have the tuple (v, 0,0,0) in 
Vertex (and nothing else). 

2. Relation Edge{idi,id 2 ); and for each e = {vi,V 2 ) G £, there are the tuples 
{vi,V 2 ), iv 2 ,vi) in Edge. This relation is not subject to any fix. 

3. Relation Tester{Red, Creen, Blue), with extension (1, 0,0), (0,1, 0), (0, 0,1). 
This relation is not subject to any fix. 

4. Integrity constraints: 

yixyz-'{Vertex(i,x,y, z),x < l,y < 1, z < 1); Vixyz^{Vertex{i,x,y,z),x > 1) 
(the same for y, z)-, Vixyz^{ Vertex{i, x,y, z),x = l,y = 1,2 = 1); Vixyz^{ Vertex^ 
i,x,y,z),x = l,y= 1); etc. 

yijxyz^{ Vertexii, x, y, z), Vertex{j, x, y, z), Edge(i,j), Tester{x, y, z). 

The graph is 3-colorable iff the database has an LS-fix wrt ICq. The reduction 
is polynomial in the size of the graph. If there is an LS-fix of the generated 
instance, then the graph is 3-colorable. If the graph is colorable, then there is a 
consistent instance with the same key values as the original instance; then, by 
Proposition 1, there is an LS-fix. 

For membership, it suffices to prove that if an LS-fix exists, then its square 
distance to D is polynomially bounded by the size of D, considering both the 
number of tuples and the values taken by the fixable attributes. 

We will show that if an LS-fix D' exists, then all the values in its fixable 
attributes are bounded above by the maximum of ni -I- n -|- 1 and n 2 n + 1, 
where n is the number of tuples in the database, ni is the maximum absolute 
value in a fixable attribute in D, and 712 is the maximum absolute value of a 
constant appearing in the ICs. 

The set of denial ICs put in disjunctive form gives us a representation for 
all the ways we have to restore the consistency of the database. So, we have a 



constraint of the form (/?i A 1^2 • • • where each (pi is a disjunction of negated 
database atoms and inequalities, e.g. something like -^P{X, Y, Z) V -^R{Xi , Fl) V 
X < Cl y Y < c^y Z ^ Yi. Since fixes can be obtained by changing values of 
non key attributes, each tuple in a fix is determined by a set of constraints, each 
of which is a disjunction of atoms of the form XiOiCm or Xi ^ Yj, where 9i is 
an inequality of the form <, >, <, >. E.g. from ^P{X, Y, Z) V ^R{Xi,Yi)y X < 
Cl V F < C 2 V F ^ Fi we get X < ci V F < C 2 V F 7 ^ Fi, which for a specific 
tuple becomes F < C 2 V Z 7 ^ Fi if Af is part of the key and its specific value for 
the tuple at hand does not satisfy X < ci (otherwise we drop the constraint for 
that tuple). In any case, every tuple in a fix can take values in a space S that 
is the intersection of the half-spaces defined by inequalities of the form XiOiCm 
minus the set of points determined by the non-equalities Xi^Yj. 

If there is a set of values that satisfies the resulting constraints, i.e. if there 
is an instance with the same key values that satisfies the ICs, then we can 
find an LS-fix at the right distance: if the difference between any value and 
max(ci, ■ ■ • , Cl) is more than n + \ (the most we need to be sure the inequalities 
Xi 7 ^ Yj are satisfied), then we systematically change values by I, making them 
closer to the borders of the half-spaces, but still keeping the points within S. 

(b) COA^P-hardness follows from Proposition 2 and part (a). □ 

Proof of Theorem 4: (a) We reduce 3-SAT’s complement to LS-fix checking 
for a fixed schema and set of denials IC. We have a table Lit{l, 1) storing comple¬ 
mentary literals (only), e.g. (jj, ^p) if p is one of the variables in the instance for 
SAT. Also a table Cl storing tuples of the form {p, /, k), where (/? is a clause (we 
assume all the clauses have exactly 3 literals, which can be simulated by adding 
extra literals with unchangeable value 0 if necessary), I is a literal in the clause, 
and k takes value 0 or 1 (the truth value of I in (p). The first two arguments 
are the key of C. Finally, we have a table Aux{K, N), with key K and fixable 
numerical attribute N, and a table Num{N) with a rigid numerical attribute N. 

Given an instance ^ A- • -Apm for 3-SAT, we produce an initial extension 
D for the relations in the obvious manner, assigning arbitrary truth values to 
the literals, but making sure that the same literal takes the same truth value in 
every clause, and complementary literals take complementary truth values. Aux 
contains (0,0) as its only tuple; and Num contains (s-l-1), where s is the number 
of different propositional variables in P. 

Consider now the following set of denials: 

(a) L,U),U > 1); ^{Cl{p, L,U),U < 0) (possible truth values). 

(b) L,U), Cl{il’,L, V), U ^V) (same value for a literal everywhere). 

(c) ^{Cl{p, L,U), Cl{ip, L',V), Lit{L, L'),U = V) (complementary literals). 

(d) ^{Cl{p,L,U),Cl{p,L\V),Cl{p,L",W),U = V = W = 0,L ^ L',..., 
Aux{K, N),N = 0) (each clause becomes true). 

(e) ^{Num{Z), Aux{K, N), N ^ 0, N ^ Z) (possible values). 

It holds that the formula is unsatisfiable iff the instance D' that coincides 
with D except for Aux that now has the only tuple (0, s -I- 1) is an LS-fix of D 
wrt IC. Thus, checking D' for LS-fix is enough to check unsatisfiability. 



For membership to coNP, for an initial instance D, instances D' in the com¬ 
plement of Fix{lC) have witnesses D" that can be checked in polynomial time, 
namely instances D" that have the same key values as D, satisfy the ICs, but 
A{D,D") < A(D,D'). 

(b) For the first claim on CQA, let IC and a query Q be given. The complement 
of CQA is in Given an instance D, non deterministically choose an 

instance D' with D' Q and D' a fix of D. The latter test can be done in coNP 
(by part (a)). But NP‘^°^^ = NP^^ = Af. In consequence, CQA belongs to 
coEP = nP. 

For the second claim, we prove hardness of CQA by a reduct ion 

from the following problem [16, Theo. 3.4]: Given a Boolean formula • • • , Xn) 
in 3CNF, decide if the last variable Xn is equal to 1 in the lexicographically max¬ 
imum satisfying assignment (the answer is false if ip is not satisfiable). 

Given the clauses Gi,..., Cm in ip, we create a database D with relations 
Var{id, tr,fa, weight), Cl{id, vari , vali , var 2 , val 2 , varg, valg) and constraints: 

1. yid, tr, fa^{Var(id, tr, fa, _) A tr < 0 A /a < 0) 

2. yid, tr, f a-^\v ar\id, tr, fa, -) A tr > 1 A /a > 1) 

3. yid, tr, fa, w^{V ar{id, tr, fa, w) A fa = 1 Aw > 0) 

4. yid, vi,xi,V 2 , X 2 , V 3 , X 3 ^{Cl{id, vi,xi,V 2 , X 2 , V 3 , X 3 )AVar{_, vi,x'-f)AVar{_, V 2 , x 

AVar{_, V 3 ,x'^) Ax\f=- x\ A X 2 x'^ A V 3 x'^ 

The extended denial constraint in 4. could be replaced by eight non-extended 
denial constraints. 

For each variable Xi, insert a tuple (AT^, 0, 0, 2"“*) into Var. Inbinary en¬ 
coding, the values 2"“® are polynomial in the size of original formula. For each 
clause Ci = V V 1 ^ 3 , insert a tuple {Ci, Xi^,li^, Xi^,li^, Xi,^,li,f) into Cl, 
where is equal to 1 in case of positive occurrence of variable Xi^ in Ci and 
equal to 0 for negative occurrence. For example, for Cq = X^y ^Xg V Ari 2 , we 
insert {Cq, Xq,!, Xg,0, Xi 2 ,l). 

It is easy to see that all the fixes D represent satisfying assignments for ip, 
such that in case a tuple {Xi, 1,0, _) is in a fix, then value true must be assigned 
to variable Xi; and value false must be assigned to Xi in case {Xi,0, 1,_) is in 
the fix. If Ip is unsatisfiable, then there are no fixes. 

Let us now consider the cost of a repair. Assume that Si = Xji,..., and 
S 2 = xq,..., Xi'^ are satisfying assignments with Si < S 2 under lexicographical 
order. Since Ai ^ S' 2 , there exists an integer m such that im ^m^ while for all 
j < m, Aj “ definition of lexicographical order. The cost of repair S is 

equal to -|- 2”*2 2"‘'"x -p n, because (a) we have to update attribute 

weight for each variable that is assigned value true, and (b) for each tuple in 
relation Far, attribute tr or attribute fa is changed by 1. 

Because of (a), there exists a term 2”“®'" in the cost of Si that is bigger than 
the sum of all terms in S 2 , with index > z^. So, the cost of the fix representing Si 
is greater than the cost of the fix representing S 2 - In consequence, the minimal 
fix would be the maximum according to the lexicographical order of satisfying 
assignments. 



The answer to the ground atomic query Var{Xn, 1,0,1) is true iff takes 
the value 1 in the l.o. greatest assignment. □ 

Proof of Theorem 5: The reduction can be established with a fixed set ICq 
of ICs. Given an undirected graph Q = {V,S), consider a database with rela¬ 
tions V{X, Z), E{U,W), where is a key and Z is the only fixable attribute 
and takes values in {0,1} (which can be enforced by means of the linear denials 
yxyz^(y{x,z),z > l), yxyZ^{V{X,Z),Z < O) in ICo). intuitively, Z indi¬ 
cates with 1 if the vertex X is in the cover, and with 0 otherwise. Attributes 
U, V are vertices and then, non numerical. 

In the original database D we have the tuples V{e,0), with e G V; and also 
the tuples E{ei, 62 ) for (ei, 62 ) G £■ Given the linear constraint 

yX^ZiX 2 Z 2 ^{V{X^, Zi), 1/(X2, Z 2 ), EiXi,X 2 ), Zi = O, Z 2 = O) 

in ICq, the LS-fixes of the database are in one-to-one correspondence with the 
vertex covers of minimal cardinality. 

For the query : q{sum{Z)), sum{Z) < k, with q{sum{Z)) V(X,Z), 
the instance {D, yes) for consistent query answering under brave semantics has 
answer No, (i.e. is false in all LS-fixes) only for every k smaller than the 
minimum cardinality c of a vertex cover. □ 

In the previous proof, the first two ICs in ICq can be eliminated, and the 
third one is local (c.f. Definition 5). In consequence, the theorem also holds for 
local denials. 

Proof of Proposition 3: By reduction from the MAXSNP-hard problem B- 
Minimum Vertex Cover (BMVC), which asks to find a minimum vertex cover in 
a graph whose nodes have a bounded degree [15, chap. 10]. We start by encoding 
the graph as in the proof of Theorem 5. We also use the same initial database D. 
Every LS-fix D' of D corresponds to a minimum vertex cover V' for Q and vice 
versa, and it holds jV'j = A{D,D'). This gives us an L-reduction from BMVC 
to DFP [20]. □ 

Lemma A.2. Given a database D and a set of consistent local denials IC, there 
will always exist an LS-fix D' of D wrt IC. □ 

Proof: As shown in proof of Lemma A.I for every fixable attribute in F it is 
possible to define, using the integrity constraints in IC, an interval [ci,c„] such 
that if the value of attribute A is in that interval there is no constraint ic G IC 
with a built-in involving A such that I{D,ic,t) ^ 0. Let D" be a database 
constructed in the following way: for every tuple t G D such that the value of a 
fixable attribute does not belong to its interval, replace its value by any value 
in the interval. Clearly D' will be a fix but will not necessarily be an LS-fix. By 
Proposition I we know there exists an LS-fix D' for D wrt IC. □ 

Definition 11. Given a database D and a set of ICs IC, a local fix t' for a tuple t 
does not generate new violations if IJ 67 c(U^eD'^(^^ 1 ) \ ic, /)) — 

0 for D' = {D\ {t}) U {t'}. □ 



Lemma A. 3. For a set IC of local denials, if t' is a local fix of a tuple t, then 
t' does not generate new violations^ in database D wrt IC. Furthermore, this 
holds also for t' a “relaxed” local fix where the distance to t is not necessarily 
minimal □ 

Proof: Tuple t' can only differ from t in the value of fixable attributes. Let us 
assume that one of the modified values was for an attribute A. Since we have 
local constraints, attribute A can only be in the constraints related either to < 
and < or to > and >, but not both. Without lost of generality, we will assume 
that the constraint is written as in equation 1 and that A is related only to 
> and >. Since t' is a local fix, S{t,t') is not empty and there is a set ICt of 
constraints for which t' solves the inconsistency in which t' was involved. There 
is an interval [c/, +oo) for A that can be obtained by the limits given in ICt that 
show the values of A that would force the satisfaction of the constraints in ICt 
that have attribute A in an inequality. This shows that the value of attribute A 
in t' is bigger than the value of A in t. 

For D' = {t}) U {t'} we need to prove that Uice/c(Ui6D' 1) \ 

= 0 . By contradiction let us assume that for a constraint 
ic G IC there exists a violation set I such that / G ic, 1) and 

I ^ ic,l). There are two cases to consider (with (I,ic, we indicate 

that / is a violation set for IC ic): 

— (/, ic) G S{t, t'). Then I G T{D, ic, t), but since we wanted an J ^ 
ic, 1) this is not possible. 

— {I,ic) ^ S{t,t'). Then we have two possibilities I ^ T{D,ic,t) or ((J\{t})U 
{t'})^tc. 

• Let us consider first that I ^ I{D,ic,t). We have that I G X(L>', 

ic, 1) and since t' is the only difference between D and D' we have I G 
I{D', ic,t'). Since all the constraints can only have attribute A with > 
or > we now that in particular ic does. Since I ^ X{D,ic,t) we know 
that A satisfied the condition in ic and since we know that t' has a bigger 
value than in t, it is not possible to generate an inconsistency in D'. We 
have reached a contradiction. 

• Let us consider ((/ \ {<}) U {t'}) ^ ic. Then I G I{D',ic,t'). From our 

assumption / ^ ic,l).This is the same situation analyzed in 

previous item. 

In all the cases we have reached contradiction and therefore the proposition is 
proved. Since we never used the property of minimal distance between t' and t, 
the second part of the Lemma is also proved. □ 

Proposition A.2. For local denials it always exists an LS-fix for a database 
D; and for every LS-fix D', D' \ D is a set of local fixes. Furthermore, for each 
violation set {I,ic), there is a tuple t G I and a local fix t' for t, such that 
(/, ic) G S{t, t'). □ 

Proof: Since each attribute A can only be associated to < or > built-ins, but 
not both, it is clear that set of local denials is always consistent. By Lemma A.2, 
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c.f. Definition 11 



there always exists an LS-fix D'. Now we need to prove that D' \ D is a set of 
local fixes. By contradiction assume that t' G {D' \ D) is not a local fix of the 
tuple t. This can happen in the following situations: 

— t was consistent. From Lemma A.3 we know that no new inconsistencies can 
be added by the modifications done to the other tuples and therefore t is not 
related to any inconsistency. Then D* = D' \ {t'} U {t} is also consistent 
and A{D, D*) < A{D, D'). But D' is an LS-fix so this is not possible. 

— t is involved at least in one violation set. If S(t, t') = 0 then t' is not solving 
any violation set and therefore D* = D' \ {t'} U {t} is also consistent and 
A{D,D*) < A{D,D'). But D' is an LS-fix so this is not possible. Now, if 

^ 0, from Lemma A.2, considering D = {t} and IC = {ic\{I,ic) G 
there exists an LS-fix D' of D, i.e. there exists a local fix t" such 
that Since t” is a local fix we know that Z\({t},{t"}) < 

A{{t},{t'}). They cannot be equal that would imply that t' is a local fix 
and it is not. Then D* = D' \ {t'} U {f} is also consistent and A{D, D*) < 
A{D, D'). Again, this is not possible because D' is an LS-fix 
The second part of the proposition can be proved using Lemma A.2 and consid¬ 
ering a database D = I and a set of constraints IC = {ic}. □ 

Proof of Proposition 4: For the first claim, membership follows from Theorem 
2(b); and for hardness, we can do the same reduction as in Theorem 2(b), because 
the ICs used there are local denials. For the second claim, it is not difficult to 
see that the non-local denials in the proof of Proposition 3 can be eliminated. □ 

Proposition A.3. For a database D and a set of local denial constraints IC: 

1. For a set of local fixes {ti,... t„} of a tuple t there always exists a local fix 
t* such that 

2. For local fixes t', t" and t'" of a tuple t with S(t,t"') = S(t,t') U it 

holds that A({t}, {t'"}) < A{{t}, {t'}) -I- A{{t}, {t"}). □ 

Proof: First we prove item (1). Let ICt = {ic\I{D, ic,t) ^ 0 and ICs{t,t') = 
{ic\{I, ic) G S{t, t')}. From Lemma A.2, considering D = {t} and IC any subset 
of ICt, there always exists an LS-fix D' of D. This LS-fix is a local fix of tuple t 
with ICs{t,t') = IC. Since we can find a local fix for any IC C St then clearly 
the lemma can be satisfied. 

Now we will prove item (2). If the fixable attributes that where modified 
in t' and t" are disjoint, then t'" when combining the modifications I’ll get 
Z\({t}, {t"'}) = 2\({t}, {t'}) + 2\({t}, {t"}). Now, we will consider the case were 
t' and f" have at least one fixable attribute, say A that is modified by both local 
fixes. In this case t'" will have a value in A that solves the inconsistencies solved 
by and t' and t" . This value will in fact correspond to the value of A in t' or t” and 
therefore we will have that A{{t}, {t"'}) < A{{t}, {t'}) -I-Z\({t}, {t”}). Let M be 
the set of attributes that are modified both by t' and t” , we can express the rela¬ 
tion as follows: A{{t},{t"'}) = = Z)^gjc(7Pi(0-T4(i'))^ 

+ EAeA^Ait) - TTAin)^ - Ea^M Mtn{{nA{t) - 7^^(^'))^ AaA - Mt")?} □ 



Proposition A. 4. If an optimal cover C for the instance {U,S) of MWSCP 
has more than one S(t, t') for a tuple t, then there exists another optimal cover 
C for {U,S) with the same total weight as C but with only one t' such that 
S{t, t') € C. Furthermore, D{C) is an LS-fix of D wrt IC with A{D, D{C)) equal 
to the total weight of the cover C. □ 

Proof: To prove the first part, let us assume that G C. From 

Proposition A.3 there exists an G S such that U 

i-e such that it covers the same elements as and ^From 

Proposition A.3 A{{t},{t"'}) < A{{t},{t'}) + Z\({t}, {t"}) and therefore that 
weight of is smaller or equal than the sum of the weight of the original 

two sets. If G\({t}, {t"'}) < A{{t}, {F})+Z\({t}, {<"}) we would have that C is not 
an optimal solution so this is not possible. Then A{{t},{t"'}) = A{{t},{t'}) + 
A{{t}, {t"}). Then, if we define C' = (C \ {S(t, t'), S(t, t”)}) U {S(t, t'")} we will 
cover all the elements and we will have the same optimal weight. 

Now we need to prove that given D{C) is an LS-fix. D{C) is obtained by first 
calculating C' and therefore we have an optimal cover with at most one S(t,t') 
for each tuple t. Then D{C) is obtained by replacing t by t' for each S{t, t') G C. 
It is direct that D{C) has the same schema as D and that it satisfies the key 
constraints. Now, since C covers all the elements, all the inconsistencies in D 
are solved in D{C). From Lemma A. 3 the local fixes t' do not add new violations 
and therefore D{C) ^ IC and D{C) is a fix. We are only missing to prove that 
D{C) minimizes the distance from D. Clearly A{D, D{C)) = ^({0) {^^}) 

= Xs(t.t')eC'= Xs(t.t')ec'^S(t,t') = w. So, since the optimal solution 
minimizes w, A{D, D{C)) is minimum and D{C) is an LS-fix. □ 

Proof of Proposition 5: From Propositions A. 3 and A. 4. □ 

Proof of Proposition 6: To prove it it is enough to construct this optimal 
cover. Let C = G {D' \ D). By definition C = C and D{C) = D'. We 

need to prove that C is an optimal cover. Since D' is consistent, all the violation 
sets were solved and therefore C is a cover. Also, since A{D, D') = A{D, D{C)) = 
w and A{D, D') is minimum, C minimizes the weight and therefore is an optimal 
cover. □ 

Proof of Proposition 7: We have to establish that the transformation of 
DFOP into MWSCP given above is an L-reduction [20]. So, it remains to verify 
that the reduction can be done in polynomial time in the size of instance D for 
DFP{IC), i.e. that Q can be computed in polynomial time in n, the number of 
tuples in D. Notice that if rrii the number of database atoms in ici € IC, and 
m the maximum value of rrii there are at most n™* hyperedges associated to 
iCi G IC, each of them having between 1 to m tuples. We can check that the 
number of sets and their weights are polynomially bounded by the size 

of D. There is one S{t, t') for each local fix. Each tuple may have no more than 
\T\ X \IC\ local fixes, where F is the set of fixable attributes. 

The weight of each S{t, t') is polynomially bounded by the maximum absolute 
value in an attribute in the database and the maximum absolute value of a 
constant appearing in IC (by an argument similar to the one given in the proof 
of Proposition 2). 



With respect to D{C), the number of sets in S is polynomially bounded by 
the size of D, and since C C 5, C is also polynomially bounded by the size of D. 
To generate C it is necessary to search through S. Finally, in order to replace t 
in D for each tuple t' such that S{t, t') G C we need to search through D. □ 

Proof of Proposition 8: Using the same arguments as in the proof of Proposi¬ 
tion A.4 we have that since C is a cover then D{C) is a fix of I? wrt IC. We need 
to prove that A{D,D{C)) < log{N) x A{D,D'). We know that A{D,D{C)) = 
^teD ^({0) gg* As described in definition 10, C* is ob¬ 

tained from C by replacing, for each t, all the sets S{t,ti) G C by a unique set 
such that Since we are using euclidian distance to 

calculate the local fixes, A{{t}, {t*}) < A({0) Then, 

A{D,D{C)) =w. 

Thus, A{D, D{C)) <w< log{N) x w° = log{N) x A{D, D'), for every LS-fix D' 
of U. □ 

Proof of Theorem 7: Based on the tractability results in [11], it suffices to 
show that the LS-fixes for a database D are in one-to-one and polynomial time 
correspondence with the repairs using tuple deletions [2, 7] for a database D' 
wrt a set of key dependencies. 

Since we have IADs, the violation sets will have a single element, then, for 
an inconsistent tuple t wrt a constraint ic G IC, it holds I{D, ic,t) = {t}. Since 
all the violation sets are independent, in order to compute an LS-fix for D, we 
have to generate independently all the local fixes t' for all inconsistent tuples t 
such that ({t}, ic) G S{t, t'), with ic G IC\ and then combine them in all possible 
ways. 

Those local fixes can be found by considering all the candidate fixes (not 
necessarily LS-minimal) that can obtained by combining all the possible limits 
for each attribute provided by the ICs (c.f. Proposition A.l); and then checking 
which of them satisfy IC, and finally choosing those that minimize Z\({t}, {t'}). 
There are at most 21-^1 possible candidate fixes, where T is the set of fixable 
attributes. 

Let us now define a database D' consisting of the consistent tuples in D 
together with all the local fixes of the inconsistent tuples. By construction, D 
and D' share the same keys. Since each inconsistent tuple in D may have more 
than one local fix, D' may become inconsistent wrt its key constraints. Each 
repair for D', obtained by tuple deletions, will choose one local fix for each 
inconsistent tuple t of D, and therefore will determine an LS-fix of D wrt IC. □ 

Proof of Proposition 9: The WP-complete PARTITION problem [12] can 
be reduced to this case for a fixed set of IADs. Let a A be a finite set, whose 
elements a have integer sizes s(a). We need to determine if there exists a subset 
S' of A, such that «(«) = n := (EaGA s{a))/2. 

We use two tables: Set{Element, Weight), with key {Element, Weight}, con¬ 
taining the tuples (a,s(a)) for a G A; and Selection(Element,X,Y), with key 
Element, fixable numerical attributes X,Y (the partition of A) taking values 



0 or 1 (which can be specified with IADs), and initially containing the tuples 
(a, 0, 0) for a G A. Finally, we have the lAD \/E, X,Y^{Selection{E, X, Y),X < 

i,r < 1 ). 

There is a one-to-one correspondence between LS-repairs of the original 
database and partitions X,Y oi A (collecting the elements with value 1 in ei¬ 
ther X or y). Then, there is a partition with the desired property iff the query 
Q : {Set{E, W), Selection{E, X,Y),X = 1, sum{W) = n) has answer yes under 
the brave semantics. The query used in this proof is acyclic and belongs to the 
class CTree- D 

For the proof of Theorem 8 we need some preliminaries. Let us define a function 
F, with domain Q x S, where Q — (V, S) is a graph and S' is a subset of vertexes 
of graph Q, and range non-negative integers. The function is defined as the 
summation over all the vertices v G S, of cubes of the number of edges connecting 
V to vertexes in the complement of S. 

Definition 12. Given a graph Q = {V,S) and subset of its vertexes S GV 

- E^{S,v) = \T{S,v)\^ where T{S,v) = 

- E{g,S)=j:resFKS,v) □ 

Lemma A.4. Given a fixed regular undirected graph Q = (V,S) of degree 3, the 
maximal value of F{Q, S) on all possible sets S C fo is (3^ x |/|) for I a maximal 
independent set. □ 

Proof: Let us first assume that S is an independent set, not necessarily maximal. 
In this case the answer to E{g, S) will be 3 ^ x jS”!, because each element v G S 
is connected to three vertices in V \ S'. Then, among independent sets, the 
maximum value for E{Q, S) is 3^ x to, where m is the maximum cardinality of 
an independent set. 

Let tJ[S] = g (S, £s) where £s are all the edges {v, v') G £ such that u, v' G S. 
Now, if S is not an independent set, there exists a maximum independent set 
Is of fJ[S]. Every v G {V \ S) is adjacent to at least one vertex in Is, otherwise 
Is U {f}, would be an independent set contained in S and with more vertices 
than Is, contradicting our choice of Is- Now let us define Fext{S, v) = {F^{S, u)-|- 
v')e£ Since every edge v' G {S \ Is) is adjacent to Is, it is easy 

to see that: 

F{g,S)<Y,Fext{S,v) (3) 

v£l 

We want to prove that F{g,S) < F{g,Is)- This, combined with equation 
(3) shows that it is enough to prove that Fext{S,v) < F{g,Is)- Since 

F{g, Is) = E„e/s v), we need to prove E„e/s ^ext{S, v) < J2veis v) 

and then, it would be sufficient to prove that Fext{S,v) < F'-{Is,v) is true for 
every v G Is- For v G Is and S' = (S \ Is), we have the following cases: 

1. If u is adjacent to one vertex in S' then Fext{S, v) < 2^ -|-2^ and F^{Is, v) = 
3^ and therefore Fext{S,v) < {F'-{Is,v) — 11). 


J {v'\v' G (V \ S') A {v,v') G £), V G S 
1 0 , v^S 



2. If V is adjacent to two vertexes in S" in analogous way to item (1) we get 
FeMS,v) < {F‘ils,v)-10). 

3. If V is adjacent to three vertexes in S" in analogous way to item (1) we get 
FeMS,v) < (F'(Js,t>)-3). 


Then, we have proved that Fext{S, v) < F’'{Is, v) and therefore that F{Q, S) < 
F{Q,Is). We also know that, since Is is an independent set, that F{Q,S) < 
F{G, Is) < 3^ X m. □ 


Proof of Theorem 8: (a) For sum: By reduction from a variation of Indepen¬ 
dent Set, for graphs whose vertices have all the same degree. It remains TVP-hard 
as a special case of Independence Set for Cubic Planar Graphs [13]. Given an 
undirected graph G = {V,S) with degree 3, and a minimum bound k for the size 
of the maximal independent set, we create a relation Vertex{y,Ci,C 2 ), where 
the key 1^ is a vertex and Ci,C 2 are fixable and may take values 0 or 1, but 
are all equal to 0 in the initial instance D. This relation is subject to the denial 
IC : yV,Ci,C 2 ^{Vertex{V,Ci,C 2 ),Ci < 1,C2 < 1). D is inconsistent wrt this 
constraint and in any of its LS-fix each vertex v will have associated a tuples 
Vertex{v,l,0) or Vertex{v,0,l) but not both. Each LS-fix of the database de¬ 
fines a partition of V into two subsets: S with {v, 1,0) and S' with {v, 0,1), where 
clearly 5'US" = V and SnS' = 0. Let us define a second relation Edge{Vi, V 2 ,W), 
with rigid attributes only, that contains the tuples (vi,V 2 ,l) for (vi,r' 2 ) G S or 
(v 2 ,vi) G £. Every vertex v appears in each argument in exactly 3 tuples. 

Consider the ground aggregate conjunctive query Q: 
q{sum{Wo)) ^ Vertex{Vi,Cii,Ci 2 ), Cn = 1, 

Edge{Vi,V 2 ,Wo), Vertex{V 2 ,C 21 , 022 ), C 2 i= 0 , 

Edge{Vi , Eg, W"i), Eertex(Eg, Cgi, Cga), Ggi = 0, 
Edge{Vi,Vi,W 2 ), Vertex{Vi,C^ 1 , 0 ^ 2 ), Cn =0. 

The query Q, computes the sum of cubes of the number of vertexes of S' 
adjacent to vertices in S, i.e. it calculates the function from graph to nonnegative 
numbers corresponding to F{G, S) from Definition 12 with Q{D') = F{G, S) for 
D' G Fix{D, IC) and S = {v| Vertex{v, 1, 0) G D'}. 

We are interested in the minimum and maximum value for Q in Fix{D, IC), 
i.e. the min-max onswer introduced in [3]. Since the function is nonnegative and 
since its value is zero for S' = 0 and S = V we have that its minimum value is 
zero. We are only missing to find its maximum value. 

Erom Lemma 4 we have that the answer to query Q is at most 3^ x j/j with 
/ a maximum independent set. In consequence, the min-max answer for Q is 
(0, 3^ X m), with m the cardinality of the maximum independent set; and then 
there is an independent set of size at least k iff min—max answer to Q > kx3^. 
(b) For count distinct: By reduction from MAXSAT. Assume that an instance 
for MAXSAT\s given, consisting of a set U of propositional variables, a collection 
C of clauses over U and a positive integer k. The question is whether at least k 
clauses can be satisfied simultaneously, which will get answer yes exactly when 
a question of the form countd < (fc — 1), with countd defined by an aggregate 



query over a database instance (both of them to be constructed below), gets 
answer no under the min-max semantics. 

Define a relation Var{u, ci, C2), with (rigid) first key attribute, and the second 
and third fixable (the denial below and the minimality condition will make them 
take values 0 or 1). The initial database has a tuple (m, 0,0) for every u G U. 
Another relation Clause(u, c, s), has no fixable attributes and contains for every 
occurrence of variable u G 17 in a clause c G C a tuple {u, c, s) with s an 
assignment for u satisfying clause c. The IC is yu,ci,C 2 ^{Var{u,ci,C 2 ),ci < 
1, C 2 < 1). The acyclic query is 

q(countd(c)) ^ Var(u, ci, C2), Clause(u, c, s), ci = s, 

where countd denotes the “count distinct” aggregate function. Its answer tells 
us how many clauses are satisfied in a given LS-fix. The max value taken on a 
LS-fix, i.e. the min-max answer, will be the max number of clauses which may 
be satisfied for MAXSAT. 

(c) For average: By reduction from US AT. We use the same table Var(u, ci, C2) 
and IC as in (a). Now, we encode clauses as tuples in a fixed relation 
Clause{val, vari , vali , var 2 , val 2 , varg, valg), where var ^, war^, varg are the vari¬ 
ables in the clause (in any order), vali,val 2 ,valg all possible combinations of 
truth assignments to variables (at most 8 combinations per clause). And val is 
the corresponding truth value for the clause (0 or I). Now, consider the acyclic 
query 

q{avg(val)) ^ Clause(val, var ^, vah , var 2 , val 2 , varg, valg), 

Var{vari, vail, val)), Var{var 2 ,val 2 ,val 2 ), Var(varg, valg, valg). 

Then value of q is maximum in a LS-fix, taking value 1, i.e. the min-max answer 
to g is 1, iff the formula satisfiable. □ 

Proof of of Theorem 9: First we reduce CQA under range semantics for 
aggregate queries with sum to RWAE2, a restricted weighted version of the 
problem of solving algebraic equations over GF[2], the field with two elements. 
Next, we prove that such an algebraic problem can be solved within constant 
approximation factor. 

(A). Reduction to RWAE2. In order to define polynomial equations, we need 
variables. We introduce a set V of variables taking values in GF[2], for 
every tuple U in an LS-fix corresponding to a tuple t (a ground database atom 
in the database) with key fc in a relation R in the original database, i.e. ti 
belongs to some LS-fix and E t share the key values k. For example if the tuple 
t is consistent or admits only one local fix (one attribute can be changed and in 
only one way), only one variable is introduced due to t. Denote with bag{t) the 
set of variables introduced due to a same initial tuple t. 

Consider a conjunctive query 


Q{sum{z)) : -Ri{x),-- ■ ,R^{x). 



Throughout the a proof ip is the body of the query as a conjunction of atoms, 
m is the number of database predicates in ip, n is the number of tuples in the 
database, k is the maximal number of attribute comparisons in the ICs (and the 
maximal number of hxes of a given tuple). 

We may consider all the possible assignments (3 from database atoms in 
the query to grounds tuples in fixes that satisfy ip. The number of assignments 
is polynomial in the size of the database, actually < n™. Notice that the the 
number of LS-fixes of a database may be exponential, but the number of local 
fixes of each original tuple is restricted by the number of attributes of the tuple. 
So, the number of all possible LS-hxes of tuples is polynomial in the size of the 
original database (even linear). Here we are using the fact that we have IADs. 

Now we build a system £ of weighted algebraic equations. Each such assign¬ 
ment (3 is associated with a combination of tuples im satisfying ip. 

For each combination put the following equation over GF[2] into £: 

selected 

n(1 -.n(1 -= 1- (4) 

non—selected 

The first product in (4), before the hrst ]([, contains the variables corresponding 
to the tuples selected by (3. The rest of the product contains variables for the 
those tuples that were not selected, i.e. if ti appears in the hrst product, with 
ti G bagpt), and t 2 G bagpt), with ti ^ t 2 , then the variable X 2 corresponding 
to t 2 appears as (1 — X 2 ) in the second part of the product. This captures the 
restriction that no two different tuples from the same bag can be used (because 
the share the key values). For each combination (3 of tuples in LS-hxes there is 
no more then one equation, which in turn has a polynomial number of factors. 

Equation (4) gets weight w{E^) that is equal to the value of aggregation 
attribute z in p3. 

In this way we have an instance of the RWAE2. It requires to hnd the maxi¬ 
mum weight for a subsystem of £ that can be (simultaneously) satished in GF[2], 
where the weight of the subsystem is the sum of the weights of the individual 
equations. Of course, this problem also has a version as a decision problem, so 
as CQA under range semantics. 

Claim: The maximal weight of a satisfied subsystem of £ is the same as the 
maximal value of Q{sumz) over all possible LS-hxes of D. 

(>) Assume that query Q takes a maximum value over all possible LS-hxes of 
D on an LS-hx D'. Under IADs a database LS-hx D' is a set union of local 
hxes, with one local hx selected for every original tuple. Consider an assignment 
A dehned on V that maps variables corresponding a selected local hx to 1 and 
all other variables to 0. 

Consider all sets of local hxes which simultaneously satisfy ip. If local hxes 
ti, • • • ,tm satisfy ip, then there exist exactly one equation e for that given set 
of local hxes. The equation e will be satished since variables corresponding to 
selected local hxes have value I, and “non-selected” variables have value 0. So, 




for every set of local fixes satisfying the query body, there would be a satisfied 
equation with weight equal to the value of aggregated attribute. It means, that 
a solution to the algebraic equation problem is bigger or equal to the maximal 
query answer {min-max answer). 

(<) Consider an assignment A which is a solution of algebraic equation problem. 
It maps elements of V to {0,1} in such a way that the weight of satisfied equations 
of £ is maximum over all possible assignment for V. 

First we prove that if there exists a bag B such that more then one of its 
variables is mapped to 1, then there exist an assignment A' with the same weight 
of satisfied equations of £ as A, but B contains no more then one variable mapped 
to 1. 

Assume that for a bag B more then two variables (let us say Xi , Xj) are 
mapped to 1. It means that every equation which contains variables from B will 
be unsatisfied, since it contains either (1 — Xi) or (1 — Xj) as factors in the 
equation. If we change a value of one of the variables (say Xi) to 0, then no 
satisfied equation become unsatisfied, since satisfied equations do not contain 
Xi. No unsatisfied equation becomes satisfied, because due to the assumption of 
maximality of the weight of the satisfied subset of E for A. 

In a second step, we prove that if A is a maximal assignment and there 
exist a bag B such that all of its variables are mapped to 0, then there exist 
an assignment A', which satisfies the same subset of £ as A, but at least one 
variable from that B is mapped to 1. 

If all variables from a bag B are mapped to 0, then all equations which 
contain variables from B are unsatisfied. If we change a value of one variable 
to 1, then no satisfied equation becomes unsatisfied since all satisfied equations 
do not contain variables from B. No unsatisfied equation becomes satisfied due 
to maximality assumption of the weight of satisfied equation for A. Taking step 
by step all bags from V, for given a maximum assignment A, we produce an 
assignment A', which has exactly one variable from each bag mapped to 1. 

Now, construct a database D' which is a set of local fixes corresponding to 
variables mapped to 1. It is obviously a LS-fix, and w{E{A)) < Q{D'). 

(B). A deterministic approximation algorithm for RWAE2. The construction 
and approximation factor obtained are similar those in the approximation of 
MAXSAT. C.f. [23, 20]. In two steps, first a randomized algorithm is produced, 
that is next de-randomized. 

(Bl). Randomized approximation algorithm. Assume that from each bag we 
select one variable with probability l/Zc, where k is the number of variables in 
the bag. We map selected variable to 1 and all other variables in the bag to 0. 
For each equation e, random variable We denotes the weight contributed by e 
to the total weight W. Thus, W = and E[IFe] = We ■ Pr[e is satisfied), 

where E is a mathematical expectation and Pr is a probability. 

If the query contains m predicates, then each equation contains no more than 
m variables from different bags (never two different variables from the same bag). 



then E(We) > k '^We- Now, by linearity of expectation, 

F,[W] = Y, E[W"e] > fc-™ 

e^£ e^£ 

(B2). De-randomization via conditional expectation. We first establish 
Claim: The RWAE2 problem is self-reducible [23, Chap. A.5]. 

In fact, assume A' is a partial assignment from V, such that variables ATi, • • • , W 
are mapped to {0,1}. Let be the set of equations satisfied by A' with total 
weight IT [if'*], and 5“ is the set of equations which cannot be satisfied under 
A'. Let E" be a set of equations from S \ U 5“), such that variables from 
Xi,--- ,Xi are replaced by their values. By additivity of the weight function 
and the independence of the variables, the maximal weight of satisfied equations 
under an assignment which extends A' is W\E’’]-\-maxW\E"], where IT[if"] is a 
solution of the RWAE2 problem restricted to E". It is good enough to consider 
the self-reducibility trees T such only one variable from each bag gets value 1 
along any path in the tree. This establishes our claim. 

Assume that a self-reducibility tree T is given, with each node in it cor¬ 
responding to a step of the self-reduction. Each node v of T is labelled with 
Xi = ai, - • • ,Xi = Oi, a, partial assignment of values to variables Xi, • • • ,XiGV 
associated to the step for v of the self-reduction. Since this is a partial assignment, 
some of the equations in £ become immediately satisfied, other unsatisfied, and 
some other undetermined. The latter become a set of equations E' associated 
to V on variables V \ {Xi,. .., W}, obtained from £ by giving to the variables 
Xi,... ,Xi their values oi,..., Ui. By construction, these equations inherit the 
weight of the corresponding equations in £. 

For example, if the set of equations consists of: (1), yp{l — x) = 1, (2) 
2xz{l — y) = 1, (3) 3xw(l — y) = 1, with variables x,y, z,p,w, and the partial 
assignment, at some step of self-reduction for v is a; = l,y = 0,w = 1, then 
equation (1) becomes unsatisfiable, (2) is not satisfied but possibly satisfiable 
with an appropriate value for z; and (3) satisfied. So, E' contains equation (2), 
but with x,y replaced by their values 1,0, resp. 

The conditional expectation of any node vvciT can be computed via its sets of 
equations E' we just described. Clearly, the expected weight of satisfied equations 
of E' under a random assignment of values in GF[2] to V \ ATi, • • • , W can be 
computed in the polynomial time. Adding to this the weight of the equations 
in £ already satisfied by the partial assignment ATi = oi, • • • , ATj = gives the 
conditional expectation. 

Then we compute in polynomial time a path from the root to a leaf, such 
that the conditional expectation of each node on this path is > E[IT]. This can 
be done as in the construction in [23, Theorem 16.4]. 

In consequence, we can find a deterministic approximate solution to the 
RWAE2 problem in polynomial time. It approximates the optimum solution 
with a factor greater then It means that we can approximate the maximal 
value of aggregate conjunctive query within a factor fc”™, which depends on 
integrity constraints and a query, but not depend on the size of the database. 
This ends the proof. 



For example, the query with sum used in the proof of the A^P-hardness in 
Theorem 8 has to = 4, A: = 2, then it can be approximated within the factor 2“^. 
□ 


A.2 An Example for Theorem 1 

Consider the diophantine equation 

2x^y‘^ + Sxy + 105 = x'^y^ + y^. (5) 

Each term t in it will be represented by a relation R{t) with 8 attributes taking 
values in N: three, Ai, X 2 , A 3 , for the maximum exponent of x, three, Yi, I 2 , Y 3 , 
for the maximum exponent of j/, one, C, for the constant terms, plus a last one, 
K, for a key. Value 0 for a non-key attribute indicates that the term appears in t, 
otherwise it gets value 1. We introduce as many tuples in R{t) as the coefficient 
of the term; they differ only in the key value. We will see that only the 0 values 
will be subject to fixes. These are the relations and their ICs: 

i?(2x3j/2) A 2 A 3 Vl V 2 Y 3 CK 
0 0 0 1 0 0 1 1 

0 0 0 1 0 0 1 2 

For this table we have the following set, IC{2x^y^), of ICs: 

Vxi • • ■ X 3 ^{R{ 2 x'^y‘^){xi,... ^x^) Axi ^ X 2 ), Vxi • • • X 3 ^{R{ 2 x'^y'^){xi,. ..^xs) A 
X2 X 3 ), 

Vxi • • ■xs^{R{2x^y'^){xi,... ,xs) Ax^ xq), Vxi • • • xs^{R{2x^y'^){xi,. ..,xs) A 
X 4 ^ 1 ), 

Vxi • • • xiQ^{R{2x^y‘^){xi, ...,xs) A R{2x^y‘^){xg, • • • , a;i 6 ) A xi ^ xg) 

Vxi • • • xiQ^{R{2x^y‘^){xi, ...,xs) A R{2x^y‘^){xg, • • • , a;i 6 ) A X 5 ^ 0 : 13 ). 

Riixy) Ai A 2 A 3 Yi Y 2 Y 3 CK 
1 1 0 110 13 

1 1 0 110 14 

1 1 0 110 15 

IC{3xy): 

Vxi • • ■xie^{R{3xy){xi,.. .,xs) A R{3xy){xg,... ,xie) A X 3 xn), 

Vxi • • ■xie^{R{3xy){xi,.. .,xs) A R{3xy){xg,- ■ ■ ,xi6) Axg ^ X14), 

Vxi • • ■xs^{R{3xy){xi,. ..,xs) Axi ^ 1), Vxi • • • X8^(P(3xy)(xi,... .xg) A X 2 
1 ), 

Vxi • • •X8^(P(3xy)(xi,... ,X 8 ) A X 4 1), 

Vxi • • •X8^(P(3xy)(xi,... ,X 8 ) A X 5 1). 

i?(105) Ai A 2 A 3 Yi Y 2 Y 3 C K 
1 1 1 1 1 1 105 6 



7(7(105): 

Vxi • • • X8^(7?(105 )(xi, ..., xs) f\ xi ^ 1), Vxi • • • a;8^(-R(105)(a;i,..., a;8) A 
X2 ^ 1), 

Vxi • • • a;8^(i?(105)(a;i,..., x^ l\x^ ^ 1), Vxi • • • X8^(i?(105)(xi,..., x^) l\Xi ^ 

1 ), 

Vxi • • • a;8^(7?(105)(xi ,..., Xg) A x^ ^ 1), Vxi • • • a;8^(i?(105)(xi, ...,xs) Axq 

1 ), 

Vxi • • • a;6^(105(xi, • • • , Xq) A xy ^ 105). 

Similar tables R{x^y^) and R{y^) and corresponding sets of ICs are generated 
for the terms on the RHS of (5). 

Next we need ICs that are responsible for making equal all xs and ys in all 
terms of the equation: 

Vxi • • • xi6^(7?(2x3y2)(xi,..., X8) A R{3xy){xg, ■■■ , xie) Axi^ xn), 

Vxi • • • xi6^(7?(2x^y^)(xi,..., X8) A R{5xy){xg, xie) A xg ^ X13) 

Vxi • • • xi6^(7?(2x^y^)(xi,..., X8) A R{x'^y^){xg, ■ ■ ■ , xie) A xi ^ xio) 

Vxi • • • xi6^(7?(2x^y^)(xi,..., X8) A R{x'^y^){xg, xie) A X5 ^ X12) 

Vxi • • • xi6^(7?(2x3y^)(xi, ...,xg) A R{y'^){xg, Xie) A X5 X13). 

Now we construct a single table R(equ) that represents equation (5) by ap¬ 
pending the previous tables: 


R(equ) 


Xi Xg Xg Yi YgYg C K 
0 0 0 1 0 0 1 1 
0 0 0 1 0 0 1 2 
110 110 13 

110 110 14 

110 110 15 

1 1 1 1 1 1 105 6 

1 0 0 0 0 0 1 7 

11110 0 18 


We need ICs stating the correspondence between the terms in the tables R{t) 
and table R{equ): 

Vxi • • • xi6^(7?(egu)(xi,..., X8) A R{2x^y^){xg, xie) A xg = xie Axx^ xg), 
Vxi • • • xi6^(7?(egu)(xi, ...,xg) A R{2x^y'^){xg,xig) A Xg = Xig Axg^ Xiq), 


Vxi • • • xi6^(7?(egM)(xi,..., xg) A R{y'^){x 7 ■ ■ ■ xie) A xg = xie Ax-j ^ X 15 ). 

Finally, we have one aggregate constraint that is responsible for making equal 
the LHS and RHS of equation (5): 

SMm/j(e,„)(xi • X 2 • X 3 • X4 • X 5 • X 6 • X 7 : Xe < 7) = SUmj7(e9„)(xi-X2-X3-X4- 

X 5 • X 6 • X 7 : X 6 > 6). 

If the database has an LS-fix, then there is an integer solution to the dio- 
phantine equation. If the equation has a solution s, then there is an instance 
R{equy corresponding to s that satisfies the ICs. By Proposition I, there is an 
LS-fix of the database. 

The reduction could be done with the table R{equ) alone, making all the ICs 
above to refer to this table, but the presentation would be harder to follow. 




